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ABSTRACT 

Wide-field radio interferometric telescopes such as the Square Kilometre Array now being designed are subject to 
a number of aberrations. One particularly pernicious aberration is that due to non-coplanar baselines whereby 
long baselines incur a quadratic image-plane phase error. There are numerous algorithms for dealing with the 
non-coplanar baselines effect. As a result of our experience with developing processing software for the Australian 
Square Kilometre Array Pathfinder, we advocate the use of a hybrid algorithm, called w snapshots, based on 
a combination of w projection and snapshot imaging. This hybrid overcomes some of the deficiencies of each 
and has advantages from both. Compared to pure w projection, w snapshots uses less memory and execution 
time, and compared to pure snapshot imaging, w snapshots uses less memory and is more accurate. At the 
asymptotes, w snapshots devolves to w projection and to snapshots. 
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1. THE SQUARE KILOMETRE ARRAY 

The Square Kilometre Array (SKA^ is a world class radio telescope now being constructed by an international 
consortium of close to a dozen countries. It is being constructed in two phases. Phase 1 planned to start 
observations in 2018, and a subsequent roughly ten times larger Phase 2 planned to start observations in 2022. 
SKAl covers observing frequencies from 70MHz to 3GHz, with three distinct receptor technologies. 

• SKA1_DISH An array of 250 15m diameter parabolic dishes with single pixel feeds, 

• SKA1_SURVEY An array of 96 15m diameter parabolic dishes with phased array feeds 

• SKA1_AA_L0W An array of 250 aperture array stations, each station having 11,400 active antennas 
(roughly dipoles). 

In Phase 2, SKA1_AA_L0W would be grown to lower baselines, SKA1_DISH would be grown in number of 
antennas by close to an order of magnitude, and depending on technology demonstration, there may also be a 
mid-frequency range aperture array telescope. 

Taken together, these telescopes will form the Square Kilometre Array Observatory. In Phase 1, SKAO will 
be located in both Austraha (SKAl.SURVEY, SKA1_AA_L0W) and South Africa (SKA1_DISH). 

The science case for SKA is very well-developed.l^ It spans a wide range of pressing topics in astronomy and 
astrophysics. 

The data processing for the SKA will be very demanding. High sensitivity implies many measurements for 
processing. In addition, high sensitivity imaging requires high dynamic range since at these wavelengths the 
radio sky is bright with sources. In addition, all three telescopes will have wide fields of view, for which the 
processing is intrinsically demanding. It is this last aspect we concentrate on in this paper. Over the last 5 years, 
we have been engaged in developing the Australia Square Kilometre Array Pathfinder (ASKAP),f2l which is a 
smaller version of SKAl .SURVEY. As part of the construction of ASKAP, we have developed a software system, 
ASKAPSoft, for processing ASKAP observations into science products. We will describe what we believe to be 
the optimal wide-field processing algorithm for ASKAP data and how it will scale to SKA. 
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2. WIDE FIELD IMAGING 

A major obstacle to wide field imaging with radio synthesis arrays is the non-coplanar baselines efFectP This 
prevents the use of a simple two-dimensional transform to form an image of the sky from the measured visibilities. 

To understand the w term, we first recall that for small fields of view, the visibility obeys: 

Viu,v)^ J I{l,m)e^^-^''''^+''"''^ dldm. (1) 
For larger fields of view, the w-term becomes important 



Viu,v) ^ I e V J J dldm. (2) 

1 — P — m^ 

The new term w is the component of antenna-antenna vector towards the phase centre of the field of view. The 
physical origin of this phase term is straightforward - it comes from the need to refer the electric field to the same 
physical plane. This requires a Fresnel term. It is straightforward to show that the w-term effect is significant if 
the field of view is comparable to or greater than the square root of the resolution (both measured in radians) : 



jpov 



resolution (3) 



The actual maximum allowed field of view depends upon the accuracy required in the peak fluxes. To track 
this, we introduce a parameter apov that quantifies how much smaller the field of view must be to attain a 
given level of accuracy. We will return to the estimate of this parameter later. 

resolution (4) 

We can now discuss the various algorithms for correcting the w term. There are three approaches to counter 
the non-coplanar baselines effect in use in ASKAPsoft - wprojection,!^ wstacking, snapshots,^ ^ and faceting.^l 

W Projection The w term can be expressed as a multiplicative effect in image space, or a convolution in 
Fourier space. In both cases, the w variable acts as a parameter. The convolution relationship between 
visibility on w = and an arbitrary w plane is: 

V{u, V, w) — G{u, V, w) (g) V{u, v) 

G(u, v,w)= f , e J2^("'+''™) dldm fr,\ 

G{u,v,w) « "''-J 

In figure [l] we show the three dimensional form of G{u,v,w), with the w axis running vertically. The 
envelope of this function obeys u/w ^ 6fov, and the number of fringes across the function scales as .13 

Before going further, we need to review processing for small fields where the w term can be ignored. The 
main task to be performed is then a Fourier Transform. This is nearly always best implemented via an 
FFT. The next question is how best to deposit the irregularly sampled uv points onto the regular grid used 
in the FFT. Usually an anti-aliasing filter (AAF) is used to deposit visibility samples onto the grid. This 
is usually called convolutional gridding and the AAF is called the gridding convolution function (GCF). 
Modern practice is to use a prolate spheroidal wavefunction as the AAF. 

If the w term is significant, then the GCF can be extended to include the effects of the w term. Numerically 
this involves multiplying G(Z, m, w) with the transform of the AAF and then transforming back to Fourier 
space. The w projection algorithm uses G{u,v,w) as a convolutional gridding function. 



Figure 1. The three dimensional form of G{u,v,w), with the w axis running vertically. It is Hermitean symmetric about 
the centre point, shows in red here. 




Since w projection is a convolution in Fourier (data) space, there is an alternate approach in which the 
data are partitioned in w, and 0(1,111,, w) is applied in image space by multiplying each w plane. We call 
this w stacking. 

The primary beam A(l,'m) can be treated in a similar way since it too is a multiplicative term in image 
space: 



V{u, V, w) — A{u, V, w) (g) V{u, v) 



A(u,v,w) — J A^ljTn) 
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(6) 



For obvious reasons, this has acquired the name 'a projection' or with the w term, 'aw projection'.!^ 
Hybrid versions are also possible in which, for example, the w term is addressed using stacking, and the 
primary beam via projection ('a projection/w stack'). The exact resources required for w projection are 
those needed to calculate and cache the GCF. The memory required can be very substantial and indeed 
prohibitive. Some saving in memory and substantial savings in computation can be realised from the 
observations that the non-zero part of the convolution is bounded by a cone, and that most values of w 
are much less than the maximum. 

For application of the primary beam using equation[6] the resources can be even larger. The GCF may vary 
with frequency and/or time, necessitating even frequent recalculation or caching. Nevertheless, application 
of the primary beam in this way is very valuable. 

Snapshots For a short or snapshot observation made with a coplanar array, neglect of the w-term results in a 
distorted image coordinate systemEHH This coordinate distortion can be corrected in the image plane by 
interpolation of the image to the correct coordinate system. For that short period of time, the array will 



be instantaneously coplanar to good accuracy. This means that the w coordinate is related to (u, v) by a 
simple relationship: 



w — au + bv 



(7) 



The relationship between visibility and sky brightness may be rewritten as a two-dimensional Fourier 
transform by introducing distorted coordinates (r,rn') where: 



Parameters a, b may be estimated by linear fitting to the u, v, w coordinates. For a telescope observing at 
zenith angle Z and parallactic angle x, the parameters a and b are given by: 



If performed as the sole means of correcting the non-coplanar baselines effect, the work required for the 
image plane reprojection can come to dominate. However, if used in conjunction with w projection (or aw 
projection), an optimum tradeoff between the resources needed for each may be obtained. We discuss such 
a hybrid algorithm below. 

The image reprojection step must be done with high accuracy so as not to misrepresent the model or 
residual image. For the prediction step this is particularly important since the model may not be diffraction 
limited as the residual image is. Our code currently allows only bilinear or bicubic interpolation. Lanczos 
interpolatiorP is likely to be much more accurate, even compared to a truncated sine function. 

Facets The w phase screen can be taken to be constant or linear over small regions of the sky.^l After applying 
the residual phase term, an image can be constructed by piecing together many different facets. This 
approach is expensive in terms of floating point operations but has a modest memory footprint.^ It can 
also be extended to deal with non-isoplanatisni. 

w snapshots A superior algorithm can be obtained by combining w projection and snapshots, w is expressed 
as a linear plane plus deviations Aw. 



The current best plane in m, v, w space is chosen by least squares fit. w projection is used to project all 
visibilities onto this current plane, thus correcting Aw;, and the snapshot imaging is performed and plane 
fitting is repeated when the deviation from the last plane exceeds a specified tolerance. 

We call this algorithm 'w snapshots'. 

The optimum solution depends on the context. For ASKAP, w snapshots provides an optimum use of CPU 
and memory resources. 

2.1 Scaling with Rp 

In this subsection, we derive and illustrate the fundamental scaling laws for the w term. 
The Fresnel number, Rp, measures the w term phase error: 



I' = l + a {Vl-P-m^ - 1) 
m' = m + b (\/ 1 — P — — l) 



(8) 



a = tan Z sin x 
b — — tan Z cos x 



(9) 



w = au + bv + Aw 



(10) 
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This is roughly equal to 

R^ = f. (12) 

where A is the observing wavelength, B is the baselines, and D is the antenna diameter. 

The w term can be considered significant when the Fresnel number (roughly the phase error) is comparable 
to unity. 

Rf ~ 1 (13) 

However, this is too loose a statement for careful evaluations of imaging performance, and we must introduce 
another parameter into this relationship. 

Rf ~ apov (14) 
In practice, these numbers may depend on some very important implicit factors. 

Primary beam sidelobes To reach high dynamic range the synthesized sidelobes of sources in the first, second, 
or third primary beam sidelobes may need to be subtracted. In that case the field of view @fov ^ X/D. 

High accuracy imaging To represent and subtract sources with high accuracy may require a more stringent 
limit on the field of view, in which case the number of pixels quoted above is an overestimate. 

In figure 2, we show the behaviour of the w term as a function of resolution and field of view. 

• The green lines show the size of an image for which the w term can just be ignored. 

• The solid blue line shows the maximum field of view for which the w term can be ignored. 

• The dashed blue lines show the Fresnel number Rf as a function of resolution. This measures the strength 
of the w term phase error. Larger Fresnel numbers require more computation for all algorithms. 

• All of these numbers are approximate, depending on the definition of field of view, resolution, acceptable 
phase error, etc. 



To the top right of this diagram, we see observations for which the processing will be inordinately expensive. 
Imaging in these regions is only possible if the amount of visibility data is sufficiently low. 

3. SCIENTIFIC PERFORMANCE 

The scientific performance can be quantified in multiple ways. Here we concentrate on smearing. All the wide- 
field algorithms described above are subject to decorrelation losses for sources far from the phase centre. Clearly 
this can have as deleterious an effect on the science as time and frequency smearing. In this subsection, we 
investigate the smearing for snapshot imaging and w projection. 



Figure 2. Behavior of w term for the different SKA telescopes: Rf ~ 1 (blue line), relative processing cost Rf (blue 
dashed line), required number of image pixels on each axis (green). The red lines show the behaviour with frequency, 
with low frequency to the left. The field of view for both SKAl.DISH is diffraction limited and that for SKA1_SURVEY 
is set by the size of the phased array feed. For SKAl_AA_LOW we show two curves - that for a single beam and that for 
480 beams. 
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3.1 Snapshot imaging 

For snapshot imaging, we wish to determine the maximum value of Wstep allowed before a point source drops in 

intensity by AA. 

The shift is a complicated function of parallactic angle and zenith angle. Since we are interested in the order 
of magnitude of the smearing, we will consider a simplified case. The shifts in position for a source that transits, 
seen at zenith angle Z= 45 degrees, are: 

Al = -Axa^ = (15) 

Am = +Axb— = +Ax— (16) 

Using a parabolic approximation to the PSF peak and using equations 7 and 9, we find by differentiation 
that the drop in amplitude is given by: 

1 / Am ^ ^ 



2 \ ©res / 

Al = eresV2AA (18) 
Choosing the worst case at the edge of the field: 

= Vs^^Vaa (19) 

^FOV 

Converting a change in parallactic angle to the equivalent increment in w, we find: 

/ Vsaa\ 

Wstep = Wrms I I (20) 

This is a very stringent limit for high Frcsncl numbers and high accuracy. 

We can now calculate the number of snapshots required. Suppose that we observe for hour angle range hobs- 
Then: 

-'Vstep = n,obs = n,obs (^J-j 

Wstep VSA^ 

The same caveats apply here as for the calculation of optimum Wmax'- these equations are approximate only. 
More accurate values will require simulation. 

Snapshot imaging has one substantial flaw. In the duration of the snapshot, the (u, w, w) coverage and hence 
the best fit plane can change. Hence source positions can be biased and the source structure can be smeared, to 
a level depending on the duration of the snapshot. The bias can in principle be reduced by fitting to a extended 
timc-ehunk of (u, v, w) but this approach would complicate the processing logic. The smearing cannot be easily 
fixed. 

Furthermore, if an array is actually mildly non-coplanar (because of the spherical earth, for example) then 
there will be uncorrected phase errors, leading to image blurring. This eff'ect will be especially strong at low 
elevation angles where projection magnifies any non-coplanarity. 



Figure 3. Position and shape errors incurred by use of snapshot imaging. Top panel shows a point source imaged with 
various values of Wstep, superimposed on an image of the true source. Bottom shows the same but with w projection 
added into the processing. 
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3.2 w projection 

A point source imaged via w projection is subject to decorrelation since the w projection kernel is sampled as 
discrete locations in u,v,w space and the value for neighbouring grid point points is used in place of the actual 
value. 

The convolution function may be implemented via direct calculation as needed or via a pre-calculated lookup 
table. The accuracy of the lookup table is limited by memory. Hence w projection can be made arbitrarily 
accurate in a way that the other methods cannot. This means that w projection is very well suited to be used 
in tandem with another algorithm such as snapshot imaging that suffers significant decorrelation errors. An 
additional advantage is that the balance between snapshot imaging and w projection can be adjusted to stay 
within memory and CPU constraints. 

The sampling requirements are: 



2TrOFOV ^ ^ 

V2KA ^^^^ 

2TTdFOV 



_ V2AA 

Hence the number of w planes required goes as: 



Aw=^^ (24) 



Wmax sinZ 

= Aw ('^^ 

Thus, just as for snapshot imaging, the effect of the w term may be limited by avoiding high zenith angles 
(low elevations). 

4. COMPUTING PERFORMANCE 

We now turn to investigate the computing performance - specifically the number of operations needed. We will 
ignore that for the FFT since it nearly always is a minor expense. 

4.1 Processing time for w projection 

The time for w projection goes as: 

'-^wprojection ^visl^wproj^F {^^) 



Note that fJ,wproj is for one visibility to one grid point. Typically for gridding a billion visibilities each to one 
grid cell is fJ-wproj 3 - 8 s. There is no dependency on the desired accuracy since the accuracy is determined 
by the precomputed convolution function. 



4.2 Processing time for snapshot imaging 

Let the maximum and rms w baseline be w„,„-r. w 



When the rms basehne has moved iVsien then the 



plane must be transformed, reprojected, and accumulated. Let the range in hour angle be hobs radians. For 
snapshot imaging to construct an image over an hour angle range hobs, the time scales directly as the number of 
reprojections: 



'-^snapshot ^ ^visl^wproj^aa ^~ ^pixell^^^P^oj hobs' 



Hp 



(27) 



where Raa measures the size of the anti-aliasing filter in pixels (typically 7-9 per axis). 

Empirically we find that on a typical desktop, the time to reproject an image of size 8192 by 8192 is about 
200s so, scaling to a million pixels, we find that the cost of reprojecting a million pixels is fJ-reproj ~ 3s. 

The scaling with Rp is only linear, compared to quadratic for w projection, but note that the required 
accuracy AA is a factor. 

4.3 Processing time for w snapshots 

The calculation for the w projection part of the processing time depends on the rms w after applying the cutoff 
Wstep- For the moment, we will assume that it is Wstep- Then the time for w snapshots goes as: 



snapshots ^pixelP^'^^PTojhobs 



Or if p = Wstep/Wrms- 



Wr 



^step 



^pixelP"reprojhobs 



^visl^wprojRp 
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(28) 



(29) 



The optimum value of p is: 



C^pixelPreprojhobs \ 
^■^vis^wprojRp 1 

And for this value, the processing time is (ignoring terms 0(1)): 



(30) 



snapshots, opt (^{^-^pixelf^f^-proj^obs^ -^visf^wproj^F 



(31) 



Thus w snapshots has substantially better scaling with Rp than either w projection or snapshots. The 
improvements in absolute performance are: 



' w snap shots, opt 
T 

wprojection 



^pixelP'reprojhobs 
^vis P'wproj ^p 



(32) 



f w snap shot 8, opt 
^snapshot 



wproj 

Rp yN^ixelP'reprojh'obs 



(33) 



Wc can illustrate these results by simulations. In figure 4, we show results from a simulated long observation 
with the SKA1_AA_L0W, for baselines up to 40000A. The total run time shown here is the sum of the times for 
regridding images, degridding visibilities, and constructing the w projection convolution functions. The curves 



Figure 4. Execution time as function of transition from snapshot imaging to w projection. The curves show different 
integration times: Is (red), 3s (green), 10s (blue). For shorter integration times, the need to grid more visibility points 
shifts the optimum transition point lower, thus favouring snapshot imaging over w projection. 



Total run time versus transition w, for different integration times. Fresnel number ~ 80 
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Table 1. Scaling of processing time with Fresnel number Rf 



Algorithm 


Scahng of time 


w projection 

snapshots 
w snapshots 


lip 
Rp 

p2/3 
Up 



show different integration times: Is (red), 3s (green), 10s (blue). For shorter integration times, the need to 
grid more visibiUty points shifts the optimum transition point lower, thus favouring snapshot imaging over w 
projection. For the Is integration case, the improvement of w snapshots over pure w projection is substantial - 
just under a factor of 3. For all integration times, small values of the transition point (less than 10000 in this 
example) are disfavoured. 

Although the timing ratios can be expected to be qualitatively correct, in practice, it may be preferable to 
search numerically for the optimum value of Wstep, perhaps as an autotuning procedure. 



5. DISCUSSION 



Table 1 summarises the scaling with Rp for the three algorithms. Our new algorithm, w snapshots, has consider- 
ably better scaling than w projection, and has better scaling than snapshots, while avoiding the shape distortions 
and biases of the latter. If we take the number of visibilities and the total number of pixels to be the same, 
hobs ^ 1, and also take ^reproj ^ fJ-wproj, then w snapshots is faster than w projection by a factor R^^. 

Furthermore, w snapshots is superior to snapshot imaging for few visibilities, many pixels, and high desired 
accuracy, and is superior to w projection for few pixels, many visibilities. 

The processing time (equation 31 1 contains terms related to the observation N'^^^^^, hobs, N^s, Rp and terms 
related to the computer performance Hreproj, l^wproj- The latter two terms can be optimised. We have expended 
significant effortP on optimising the gridding cost fiwproj but no effort on ^reproj so there may be significant 
improvements yet to be had. 

Referring back to figure 2, we can see that these improvements in scaling will have the effect of decreasing the 
computational cost for SKA imaging substantially, an order of magnitude or more for SKA1_DISH, for example. 
Verifying this prediction will require improvement in the ability of our software to handle very large images 
(10® — 10^° pixels), and is deferred to a subsequent paper. 
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